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We consider covariate adjusted regression (CAR), a regression 
method for situations where predictors and response are observed af- 
ter being distorted by a multiplicative factor. The distorting factors 
are unknown functions of an observable covariate, where one specific 
distorting function is associated with each predictor or response. The 
dependence of both response and predictors on the same confound- 
ing covariate may alter the underlying regression relation between 
undistorted but unobserved predictors and response. We consider a 
class of highly flexible adjustment methods for parameter estimation 
in the underlying regression model, which is the model of interest. 
Asymptotic normality of the estimates is obtained by establishing a 
connection to varying coefficient models. These distribution results 
combined with proposed consistent estimates of the asymptotic vari- 
ance are used for the construction of asymptotic confidence intervals 
for the regression coefficients. The proposed approach is illustrated 
with data on serum creatinine, and finite sample properties of the 
proposed procedures are investigated through a simulation study. 



1. Introduction. For many statistical applications, a multiple linear re- 
gression model is a standard tool, 

p 



for data (X nr i,Y n i), i = 1, . . . , n, r = 1, . . . ,p, where 70 and 7 r are unknown 
parameters, Y n i is the response, X nr i is the rth predictor and e„j is the 
error term for the iih subject in the sample. An implicit assumption is that 
predictors and response are directly observable. However, in some situations 
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both response and predictor variables may be distorted under the influence of 
a confounding variable. In this paper we consider a variant of (1), where one 
observes contaminated versions of predictors and response. Contamination 
of the variables in the regression model occurs through a multiplicative factor 
that is determined by the value of an unknown function of an observable 
covariate U. That is, instead of observing X nr { and Y n i, one actually observes 
distorted variables X nr i and Y n i, 

(2) X nr i — cj) r (U n i^X nr i, r — 1, . . . ,p, Y n i — ip(U n i*)Y n i. 

Here and <p r {') are unknown smooth functions of the contaminating 
covariate U, and the available observations are (U n i, X nr i,Y n i). 

An example where a model of this type is relevant are the creatinine 
data that are explored further in Section 5. Here serum creatinine levels are 
regressed on cholesterol level and serum albumin. The observed response 
and the two predictors are known to depend on body mass index, defined 
as Kg/m 2 , which thus has a confounding effect on the regression relation. 
Therefore, we investigate the application of a multiplicative confounding via 
model (2), where the confounding variable U is taken to be body mass index. 
"Normalization" by weight or body mass index is common in the analysis 
of medical data, and this refers to simply dividing the measured quantities 
by these confounding variables. This type of normalization implicitly as- 
sumes that the confounding is of a multiplicative nature. The adjustment 
considered in this paper applies to a class of more general multiplicative 
confounding where the effects of the confounder are modeled by unknown 
distorting functions ip{-) and 4> r (')- This leads to flexible models that in- 
clude a large class of confounding mechanisms. Reasonable identifiability 
conditions for these functions are 

(3) E{if>(U)} = l, E{MU)} = h r = l,...,p, 

corresponding to the assumption that the mean distorting effect vanishes. 
Additional basic assumptions are that the (X r ,U,e) are mutually indepen- 
dent for r = 1, . . . ,p, and that observations made on different subjects are 
independent, with E{e n i) =0, and var(e n j) =a 2 . The assumption that the 
underlying predictors, X r , and response, Y, are independent of the contam- 
inating variable U is an assumption defining the proposed contamination 
setting through defining these unobserved, underlying variables; and for that 
matter it is not one that can be checked in practice. Thus, the question to be 
answered in practice is whether or not these independence conditions help 
define interpretable latent variables of interest from their observable coun- 
terparts. In our creatinine example, the latent variables are defined to be 
body mass index adjusted serum protein levels and cholesterol level, which 
are commonly used in medical studies. 
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The contamination of the predictor and response in a multiplicative fash- 
ion as given in (2) can alter the regression relation between the original 
response and predictors completely. It has also been shown for the case of 
simple linear regression that standard adjustment methods such as nonpara- 
metric partial regression or partial regression cannot adjust for the multi- 
plicative contamination [11]. Therefore, a modified parameter estimation 
procedure is necessary, one which accounts for the multiplicative confound- 
ing effect of U. Such a procedure was proposed in [11], where consistent 
parameter estimation in the model (l)-(3) was established. This estimation 
procedure relies on the fact that regressing Y on X\ , . . . , X p gives rise to 
a varying coefficient model. Furthermore, a main attraction of this estima- 
tion procedure is that under the identifiability conditions of vanishing mean 
distorting effects, it also works for the case of additive contamination, that 
is, X nr i = 4> r (U n i) + X nr i, Y n i = ip{U n i) +Y n i, and for no contamination, that 
is, <p r (U n i) = ip(U n i) = 1 for r = 1, . . . ,p. Thus, the proposed estimation pro- 
cedure provides a flexible and general tool for adjustment, where the specific 
nature of the contamination of the variables or even its mere existence need 
not be known. 

The aim of this paper is to derive the asymptotic distribution of these 
parameter estimates, and to discuss applications to confidence intervals. We 
show that our proposed parameter estimates are asymptotically normal, and 
combining this result with consistent estimation of the asymptotic variance 
leads to asymptotic inference. 

The paper is organized as follows. In Section 2 we describe the model 
in detail. In Section 3 issues of estimation are discussed and the results on 
asymptotic inference are presented. Consistent estimates for the asymptotic 
variance are derived in Section 4. Applications of the proposed method to 
creatinine data and simulation studies are in Section 5. The proofs of the 
main results are assembled in Section 6, followed by the Appendix with some 
additional technical conditions and auxiliary results. 

2. Covariate adjustment via varying coefficient regression. Consider the 
model (l)-(3). Writing X n i = (X n u, . . . , X np i), the regression of the observed 
response on the observed predictors leads to 

E(Y n i | X n ^ , U n i ) 




Assuming that E{e n i) = and that (e, U, X r ) are mutually independent for 
r = 1, . . . ,p, the model reduces to 




E{Y n iip(U n i)\4>i(U n i)X nli , . . • , 4>p(U n i)X np i,U ni } 

^r-^-nri €-ni\4'l(JJni)X n ii 1 . . . , (f> p (JJ ni) X np i , U n % 



E(Y m \Xl,U m ) = V(C/ m )7o + HUni) E 7r 
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(4) 



Defining the functions 



(5) 



A)0) = i/j(u)j 



(j) r (u) 



we obtain 



Y ni = Po(U ni ) + Pr(U„i)X nn + e(U m ) 



where e(u) = tj)(u)e. 

We find that this is a multiple varying coefficient model, that is, an exten- 
sion of regression and generalized regression models where the coefficients 
are allowed to vary as a smooth function of a third variable [5]. A unique 
feature is that both the response and predictors depend on the covariate U. 

For varying coefficient models, Hoover, Rice, Wu and Yang [6] have pro- 
posed smoothing methods based on local least squares and smoothing splines, 
and recent approaches include a componentwise kernel method [13], a com- 
ponentwise spline method [2] and a method based on local maximum likeli- 
hood estimates [1]. Wu and Yu [14] provide a review of recent developments. 
We derive asymptotic distributions for an estimation method that is tailored 
to this special model. 

3. Estimation and asymptotic distributions. The estimates of the regres- 
sion coefficients j r will be obtained by targeting weighted averages of the 
smooth varying coefficient functions. Even though various smoothing meth- 
ods have been proposed in the literature for the estimation of these smooth 
varying coefficient functions, we propose a smoothing method based on bin- 
ning. The main reason for the use of the binning approach is its simplicity 
in targeting the desired weighted averages, rather than its performance on 
estimating the varying coefficient functions themselves. Nevertheless, the 
proposed binning approach has similarities with earlier developments for 
longitudinal data in Fan and Zhang [3] , who use the data collected at each 
fixed time point to fit a linear regression, obtaining the raw estimators for 
the smooth varying coefficient functions. 

Generalizing this idea to our independent and identically distributed data 
scheme, we assume that the covariate U is bounded below and above, — oo < 
a <U < b < oo for real numbers a <b, and divide the interval [a, b] into m 
equidistant intervals denoted by B n \, . . . , B nm , referred to as bins. Given m, 
the B n j, j = 1, ... ,m, are fixed, but the number of U n i's falling into B n j is 
random and is denoted by L n j. For every U n i falling in the jth bin, that is, 
U n i G B n j, the corresponding observed predictors are X n u, . . . ,X np i and the 
response is Y n i. 
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After binning the data, we fit a linear regression of Y n { on X n 



,X 



npi 



fusing the data falling within each bin B n j, j = 1, . . . ,m. The least squares 
estimates of the resulting multiple regression for the data in the j'th bin 
are denoted by 0^ = ($ n oj, ■ ■ • , $npj) T ■ The estimators of 7„o and 7 nr , for 

r = l,...,p, are then obtained as weighted averages of the @ n j's, weighted 
according to the number of data L n j in the jth bin, 

771 

(6) 7n0 = V] —PnOj 

and 



n 



(7) 



1 



7n 



X 



L 

n 



nrj ) 



nr j=l 



where X nr = n 1 J27=i X nr i and X, ' ■ is the average of the X nr i falling 
in B n j, that is, L" 1 Ya=i Xnril{u ni eB nj } [H]- These estimates are motivated 
by E{/3 (U)} = 70 and £{&(£/)X r } = j r E(X r ) [see (5) and (3)]. 

We present the asymptotic distribution of estimates 7 n o in (6), 7 nr in (7) 
for 70, 7 r in model (1), when the number of subjects n tends to infinity. As 
in typical smoothing applications, the number of bins m = m(n) is required 
to satisfy m — > oo, n/(mlogn) — > oo and m/y/n — > oo as n — > oo. We denote 

*D p 

convergence in distribution by — ► and convergence in probability by — 

Theorem 1. Under the technical conditions (C1)-(C7) in Section 6, on 
event E n [defined in (12)] with P(E n ) — > 1 as n — > oo, 

v / ^(7r t r-77-)^N(0, C J r 2 ), 0<r<p, 

where 

a 2 = 7o 2 var{^(C/)} + a 2 (X^E^ 2 (U)} , 
, 7 2 .[E(X 2 )E{^ 2 (U)} - {E(X r )} 2 ] + o- 2 {E(X r )} 2 E{ip 2 (U)}(X~ 1 ) rr 



cr: 



{E(X r )} 2 

2 1 2 [E{<f )r (U)^U)}E(X 2 ) - {E(X r )} 2 ]+ 7 2 ^T(X r ] 



and 



(8) 



X 



{E(Xr)} 2 

1 E{X{) 
E{X X ) E{X 2 ) 



1 < r < p, 



E(X p ) 
E{XiXp) 



E{X p ) E(X l X p ) ... E(X 2 ) 
is assumed to be nonsingular, according to condition (C5) in Section 6. 



6 D. §ENTURK AND H.-G. MULLER 

4. Estimating the asymptotic variance. The observable data is of the 
form (U n i, X^v Ym), i = 1) • • • , n, for a sample of size n, where X n % = (X n u, .... 
X np i) are the p-dimensional predictors. Correspondingly, the underlying un- 
observable predictors, responses and errors are (X^, Y n i, e n j), i = l,...,n, 
where X ni = (X nli , . . . ,X npi ). Let {(U' njk , X' nr]k , Y' jk , X' nrjk , Y^ jk , e' njk ),k = 

1, . . . , L n j , T — — { (Uni j X n ri , Y n i , X n ri , Y n ri , 6 n ri ) , % — 

r = 1, . . . ,p : U n i £ B n j} denote the data for which U n i S B n j, where we 
refer to (U' njk , X' nrjk , Y' njk , X' nr]k , Y^ k , e' njk ) as the fcth element in bin B nj . 
Further let (U'? , X' n j , Y^J ,X' n j, Y n J , e'^ ) be the data matrix belonging to 
the jth bin, where U' nj = (U' njl , . . . , U' njL J, % = (Y njl , . . .,Y^ L J, Y nj = 

(Xnjli ■ ■ ■ ^njL n ^)i e 'nj = ( e 'njli ■ • ■ > e 'njL nj ) an< ^ X 'njk = O-i ^nljfe' ' ' ' ' X 'npjk)' 

X 'njk = (!) x 'nijki ■•■■> X 'npjk) f° r k = 1, . . . , L n j contain p components of the 
Arth element in bin j, and 

y-/ _ / y-/T y-/T \T yf _ I ylT ylT \T 

^nj - K-^njli ■ ■ ■i Jv njL nj )L nj x(p+l)i ^nj ~ ^ryl> ■ ■ • > A njL nj )L nj x(p+l) ■ 



Then we can express the least squares estimates of the multiple regression 
of the observable data falling in the jth bin B n j as 

(9) P n j = (PnOj: ■ ■ ■ lflnpj) = ( X nj X nj) X n jY n j , 

leading to the parameter estimates 7 n o and j n r given in (6) and (7), respec- 
tively, where X nr = n~ l Ya=i x nri and X' nrj = L~j J2k=i x 'nrjk- 

Let 7„j be the least squares estimates of the multiple regression of the 
unobservable data falling into B n j, that is, 

( 10 ) lnj = (jnOj, ln P j) T = {X'^X' nj )~ x X'^fY'T . 

This quantity is not estimable, but will be used in the proof of the main 
results. 

For the estimates given in (6) and (7) to be well defined, the least squares 
estimate (3 n j must exist for each bin B n j- This requires that the inverse of 
X^X' n A is well defined, that is, det(X'T X' n j) y^O. Correspondingly, j n j will 
exist under the condition that det(X^X' n . ; ) ^ 0. Define the events 



(11) 



A n = ju; G n : inf | det(L n }X%X' nj )\ > ( and minL ni > p|, 
A n = |weO:inf| det(L~} X% X' nj )\ > ( and min Lnj P r > 



where C = min{p/2, [inf^ (£/"*•), ■ ■ • » <^(^))] V2}, P is as defined in (C5), 
U' n * = L~jY^k=\U' n jk is the average of the U's in B n j and (17, J-",P) is the 
underlying probability space. On event A n , j n Q and 7 nr given in (6) and (7), 
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and on event A n , j n j given in (10) are well denned, respectively. Event E n 
in Theorems 1 and 2 is defined to be the intersection of A n and A n , that is, 

(12) E n = AH A n . 

It is shown in Appendix A. 3 that P(E n ) — > 1 as n — > oo. 

Theorem 2. Under the technical conditions (C1)-(C7) in Section 6, on 
event E n [defined in (12)] with P(E n ) — > 1 osn-> oo, 



a 2 nr ^a 2 r , 0<r<p, 



where 

°nO = ( ~Y] ~Zr finQj ~ 7n0 



n 

V/=l 



2 m 



n . 

j=lk=l 



m t 

~r( L nj X 'nj X nj)n 



n 



~2 
nr 



— S^R 2 \^ Y' 2 _i_ V 2 _ oTiHl R Y 12 4- ^ 2 

/ , Pnr/ / y ^-nrjk ~i~ Inr^ nr z / t Pnrj / J ^nrjk "T" /nr* 

j=l fc=l i=l fe=l 



U j=l k=l 

( m L 



nrj \ lj nj nj nj ) rr 



n 

^=1 



Y 2 

nr ) 



1 < r < p, 



and 4 =(n-ir i Er=i(^n-x«r) 2 . 



Remark. These proposed variance estimates are motivated by the iden- 
tifiability conditions, the definition of the smooth varying coefficient func- 
tions given in (5), Lemma A. 3 and Lemma A. 4(a). Using the consistency 
of f3 nr j for the value of the function f3 r at the midpoint of the jth. bin 
and the definitions of Y^jk anc ^ X 'nrjk-> we target the quantities a 2 E{tp(U)}, 
-f 2 E{^j 2 (U)}, j 2 E{X 2 )E{iP 2 (U)} and -f 2 E{(j) r (U)^{U)}E(X 2 ) with the es- 
timators n- 1 X EjLlEfc=l(^jfc - PnOj - PnlAjk /n P3 K pjk ) 2 , 

EjLl n 1 L n jf3 2 Qj > n 1 EjLl /^nrj Efc=l ^nr/fc an( ^ n 7nr SjLl Air? Z)fc=l X nrjk > 



<s 
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respectively. Furthermore, relying mainly on Lemmas A. 3 and A. 4(a), we 
target {X~ l )u and {E{X r )} 2 {X- l ) rr with Y^i^^^X^X'^ and 

J2]Li n ~ lL njX 2 nrj x (L-^Jt^X'^)- 1 , respectively. 

5. Applications and Monte Carlo study. Under the technical conditions 
(C1)-(C7) in Section 6, 



(13) —{inr- TrJ-^NCO.l), 0<r<pasn^oo. 

Using the consistent estimate a 2 r of a 2 proposed in Theorem 2, it follows 
from (13) and Slutsky's theorem that 



— (7nr-7r)^N(0,l), < T < p, 

0~nr 

so that an approximate (1 — a) asymptotic confidence interval for j r has the 
endpoints 

( 14 ) 7nr±2 a/ 2-7=- 



Here z a /2 is the (1 — a/2)th quantile of the standard Gaussian distribution. 

5.1. Application to creatinine data. An observational study in which var- 
ious laboratory and patient data were analyzed for patients with end-stage 
renal disease is described in [7]. To illustrate our methods, we analyzed a sim- 
ilar but much smaller set of data and note that our analysis does not provide 
inference for the data in [7]. Variables include serum creatinine level (CRT), 
cholesterol level (CH), serum albumin level (ALB) and body mass index 
(BMI), measured for n = 508 subjects. Creatinine is a protein produced by 
muscle and released into the blood. Since the amount produced is relatively 
stable, the creatinine level in the serum is determined by the rate at which 
it is removed, and is therefore an important indicator of renal function. We 
analyze the dependence of serum creatinine (response) on cholesterol level 
and serum albumin (predictors). An unadjusted approach would be to fit 
the multiple regression model CRT = 70 + 71 CH + J2ALB + e, where e is an 
error term, usually by least squares. Body mass index (BMI) is defined as 
weight/height 2 and is known to affect both the response and the predictors. 
This provides the motivation to adjust for this influence by means of the 
CAR model (4), (5), using body mass index as the confounder U. 

The parameters 70, 71 and 72 were estimated by the CAR algorithm and 
the results were compared to the estimates obtained from the least squares 
regression of the observed CRT on observed CH and ALB. The estimates 
and the approximate 95% asymptotic confidence intervals for the regres- 
sion parameters obtained through both methods are displayed in Table 1. 
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Table 1 

Parameter estimates for the regression model CRT = 70 + 71 CH + J2ALB + e, obtained 

by least squares regression ofY— CRT (serum creatinine level ) on X\ = CH 
(cholesterol level ) and X2 = ALB (serum albumin level ), and alternatively by covariate 
adjusted regression, for n — 508 subjects 







Least sq. reg. 




Covariate adj. reg 




Coefficients 


Lower b. 


Estimate 


Upper b. 


Lower b. 


Estimate 


Upper b. 


Intercept 


1.2715 


4.3685 


7.4656 


0.3679 


3.9987 


7.6296 


CH 


-0.0106 


-0.0041 


0.0023 


-0.0154 


-0.0082 


-0.0009 


ALB 


1.1819 


1.9729 


2.7639 


1.3065 


2.2532 


3.2000 



Confidence intervals at the 95% level were obtained by the standard f-statistic for least 
squares regression and by the proposed asymptotic intervals (14) for CAR, respectively. 



The approximate confidence intervals for CAR estimates were obtained as 
proposed in (14). The scatter-plots of the raw estimates {Pnri , ■ ■ ■ , finrm) 
(9) versus midpoints of the bins (B n i, . . . , B nm ) are shown in Figure 1 for 
r = 0,1,2. 

The implementation of the binning algorithm allows for merging of sparsely 
populated bins. Bin widths were chosen such that there are at least (p + 1) 
points, enough to fit the linear regression with (p — 1) predictors in each 
bin. Where there were bins with less than (p + 1) elements, such bins were 
merged with neighboring bins. For this example with n = 508, the average 
number of points per bin was 14, yielding a total of 34 bins after merging. 

For least squares regression, CH was not found significant at the usual 
5% level, while ALB was found to be significant. When applying the CAR 
method, CH and ALB were both significant. As BMI increases, the slope 
parameter of serum albumin level increases exponentially, while the negative 
slope parameter of cholesterol level declines slightly. Adjusting for different 
BMI levels across patients, both serum albumin level and cholesterol level 
seem to play a significant role for the serum creatinine level. The effects of 
BMI are thus masking the true overall negative effect that CH has on CRT 
in the unadjusted regression equation. 

5.2. Monte Carlo simulation. The confounding covariate U was simu- 
lated from Uniform(2, 6). The underlying unobserved multiple regression 
model was 

(15) Y = 4 - X x + 0.3A 2 + 3X 3 + e, 

where X x ~JV(1.5, 0.7), X 2 ~ Af(l, 1.2), A 3 ~AA(0.5,1) and e ~ AA(0,0.3). 
The distortion functions were chosen as ip(U) = (U + 3)/7, (f>i(U) = 
(U + l) 2 /26.3333, <j) 2 (U) = (U + 10)/14 and 3 (f7) = (U + 2) 2 /37.3333, sat- 
isfying the identifiability conditions. We conducted 1000 Monte Carlo runs 
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10 15 20 25 

BMI 

Fig. 1. Scatter-plots of the raw estimates (Pnri, ■ ■ ■ , (3nrm) versus midpoints of the bins 
(Bni, ■ ■ ■ , Bum) for r = (top left panel )and r = 1 (top right panel ) and r — 2 (bottom left 
panel) inthe CAR model CRT = f3 (BMI)+f3i(BMI)CH+f3 2 (BMI)ALB+e(BMI). Local 
polynomial smooth curves have been fitted through the scatter-plots using cross-validation 
bandwidth choices ofh = S, 7, 7, respectively, for r = 0, 1,2. CRT = serum creatinine level, 
CH — cholesterol level, ALB = serum albumin level and BMI — body mass index. Sample 
size is 508, and the number of bins formed is 34. 

with sample sizes 100, 400 and 1600. For each run approximate 95% asymp- 
totic confidence intervals were formed for the regression parameters by plug- 
ging in the estimates a^ r , r = 0, . . . ,p, given in Theorem 2, into (14). The 
estimated coverage fractions and mean interval lengths for these confidence 
intervals are given in Table 2. The estimated noncoverage fractions are seen 
to get very close to the target value 0.05 as sample size increases, and the 
estimated interval lengths are sharply decreasing. 

We have also carried out simulations to study the effects of different 
choices of m, the total number of bins, on the mean square error of the 
CAR estimates. Under the rate conditions on m given in Section 3, the es- 
timates are found to be sufficiently robust regarding different choices of m. 
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6. Proofs of the main results. While the main steps in the proofs of 
the two theorems are given here, the auxiliary results for these proofs are 
deferred to the Appendix, where they are listed as Lemmas A.1-A.4. We 
introduce some technical conditions: 

(CI) The covariate U is bounded below and above, — oo < a <U <b < oo 
for real numbers a<b. The density f(u) of U satisfies inf a < n <;, f(u) > 
c\ > 0, sup a<n<b f(u) < C2 < cxo for real c±,C2, and is uniformly Lip- 
schitz continuous, that is, there exists a real number M such that 
su Pa<u<6 + c ) — /( n )l — M\c\ for any real number c. 

(C2) The variables (e, U, X r ) are mutually independent for r = 1, . . . ,p. 

(C3) For the predictors, sup 1<i<n \< r < p \X nr i\ < B for some bound Bel. 

(C4) Contamination functions and ()> r (-), 1 < r <p, are twice continu- 
ously differentiable, satisfying 

Eip(JJ) = l, E(f> r (U) = l, ^(')>0,l<r<p. 

(C5) As n — > oo, ^X T X — > X, where X , the limiting x (p+ l)-matrix, 

is nonsingular, that is, p= \ det(X)\ > 0. 

These are mild conditions that are satisfied in most practical situations. 
Bounded covariates are standard in asymptotic theory for least squares re- 
gression, as are conditions (C2) and (C5) (see [8]). The identifiability con- 
ditions stated in (C4) are equivalent to 

E(Y\X) = E(Y\X), E(X r \X r ) = X r . 

This means that the confounding of Y by U does not change the mean 
regression function. Some further technical conditions will be introduced 
in Appendix A.l; these are required to prove the auxiliary lemmas in the 
Appendix. 

For two matrices of the same dimension, let A □ B denote the Hadamard 
product, where A □ B is also of the same dimension with (i,j)th element 



Table 2 

Coverage (in percent) and mean interval length for the approximate 95% asymptotic 
confidence intervals formed for the parameters of the regression model (15) 



n 


7o 




7i 




72 




73 




Coverage 


Length 


Coverage 


Length 


Coverage 


Length 


Coverage 


Length 


100 


90.7 


0.56 


90.4 


0.32 


91.7 


0.20 


96.6 


0.73 


400 


93.4 


0.21 


94.1 


0.11 


93.4 


0.06 


95.5 


0.30 


1600 


94.2 


0.10 


95.2 


0.05 


94.7 


0.03 


95.0 


0.14 



The values were obtained from 1000 Monte Carlo runs. The average number of points per 
bin was 5, 16 and 32 for sample sizes 100, 400 and 1600. 
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equal to the product of the (i,j)th elements of A and B. 



PROOF of Theorem 1. By Lemma A. 4(b) and properties (b), (c), (e), 
(f ) given in Appendix A. 3, it holds that 

(16) supp^X^.) — {AH {L-jX%Y^}\ = O p {m- x )l^ 1)xX , 

3 

Where (L nj - X^Y^) = (L n j J^k ^njk^nj Sfc ^njk X 'nljk^ • • • ) L nj Efc YnjkX'npjk) 7 ') 
I nj nj njV V nj rajfc' nj Z^fc - 1 njk nljki " 1 " ' nj Z^A: - 1 njk npjkJ ' 

A = {W%)MU%)MU%),...MU%)<f>p(U%)} T and l (p+1)xl denotes 
a (p + 1) x 1 vector of l's. Under event Lemma A. 3 and (16) imply 
that 



(17) 



sup 

3 



O p (m 1 )l( p+ i)xii 



Pnpj ~ MU'^/MK^pj 

where j n j is as defined in (10). First consider the case r = 0. Using (17), 
Vn(%o ~ To) 

= V^(£-^4n0j -70 J 



n 
m 



n 
m 



By property (b), Lemma A. 4(a), (b) and substituting L n j Ylkii^ni X rvi X n 



nj nj nj > 



x 'nj}ike' njk for {(X'^X' nj ) 1 X'T j e' nj }i, y^n(% ~ 7o) further simplifies to 

e njk 



(18) 



EE 

j=i *:=! 



+ 



{ (-^nj ^nj X 'nj ) X 'nj } 



1A' 



\A*7o + ? 



rn 



Since the above sum is over all bins indexed by j, and over all points within 
the bins indexed by k, it is equal to the sum over all data points indexed 
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by i, summed up in a random order. We introduce notation where X' n j^ 
refers to the matrix X' n j and L n j(i) refers to the number of points in the jth 
bin such that U ni G B nj , and {(L~^ i) X^^X' nj{i) )- 1 X^ {i) } rk ^ {) is the (r, fc)th 
element of the matrix {(L~^X^X' n . for 1 < r < p, where U n i = U' n 



njk 

is the kth element in the ordered sample (JJ' n j\, • • • , U' n j L ) G S n j. Thus (18) 
is equal to 



E 



(19) 



j, A: 



n 



m 



n 



{( L nj'(i) X nj(i) X nj(i)) X ni(i)} 



(i)/lfc(i) 



To 

/ft 



7o) is asymptotically equivalent to 



+ f= U^ nj (i) A nj - (i j A „, ^ ) X , 



n 



n 



nj(i)J 



nj(i) i lfc(j) 



The term y^TnO 
t 

SnOt = 2J 
i=l 

i 

= E ^"Oij 
i=l 

since m/y^ — > 00 as n — > 00 makes the term O p (y/n/m) negligible 
Let F n0 t be the cr-field generated by {e n i, . . . , e nt , U n i, U n t, L 



7o 

hk(i) /rr 



L «iW' X ni(l)' ' • ' ' X 'nj(t)}- Then {^nOt 



nti ■L J nj(l) ) • • ■ ) 

Ei=i ^nOi, Fnot, 1 < t < n} is a mean- 
zero martingale for n > 1, since E(S n0t ) = 0, £ , (5 n0 ,t+i|-Fn0t) = SWot and 5 n0 t 
is adapted to FnOi- Since the <r-fields are nested, that is, F n ot C F n oj+i for 
all t<n, using Lemma A.l, <f? n o n — ► N(0,Oq) in distribution ([9], Theorem 
2.3 and subsequent discussion), and Theorem 1 follows for r = 0. 
Next we show 



(20) 



X' 



/ m T 

m j _ 



7r^(A r ) 



v 



N 2 (0,S r 



The asymptotic normality of y/n(fy nr — 7 r ) for r = 1, 
with a simple application of the J- method, since 7 nr 



. , p will follow from this 



-1 L n jn 



X' nr j) as defined in (7). By the Cramer- Wald device it is 



enough to show the asymptotic normality of yJn[a{^™ =l L n jn~ 1 j3 nr jX' 1 
lr E(X r )} + b{Y™ =l L n3 n 



X' nrj - E(X r )}] for real a, b, and (20) will follow. 
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Using (17), properties (b), (c), Lemma A. 4(a), (b) and substituting L n j x 
Y. k {{L-jXiTX> n] )- l X%} rk e> njk for {{X^X' nj )~ l X^e' nj ] ri we have 

m j _ m j 

E -fhnKn = E ^m': 3 )x' nr] hr + {{x^x' nj r l x^e' nj } r \ + o p (m~ l ) 



and 



EE -Mu' njk )x' niik 
j=ik=i in 

X' 



+ O^m" 1 ) 



E — = E E -MU^X'nx + OpCm- 1 ] 

3=1 j=lfc=l 

Thus using the same notation as in (19), it holds that 



^=1 



E 

8=1 



a-^=il}(U n i)X n 
n 



+ a 



X' 

nrj(i) 



n 



1p(U n i)e n i{ {L n j^ ^nj'(j) ^nj (i) ) X nj{i) } r fc(i 



m 



where X' nrj ^ = L n j Efc=i W x 'nrj(i)k- Since O p {- s /n/m) is asymptotically neg- 
ligible, the above term is asymptotically equivalent to 



S 



nrt 



E 

i=l 



a^y=i/j(U ni )X ri 
n 



X' 

~ a ^^(U n i)e ni {(L n j^X'T^X' n -^) ^ nj -(i)} r fc(i) 
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■ a-£=E(X r ) + -=MUni)X„ 
\/n \ n 



E(X r 



n 



i=X 

Let F nrt be the cr-field generated by {e nl , . . . ,e nt ,U nl , . . . ,U nt ,L nj ^, . . . , 
L n j (t) j ^nj(i) ' • • ■ ' X' n j ( 4 ) } • Then it is easy to check that {S nrt = Td=i Z nri, 
Fnrt A < t < n} is a mean-zero martingale for n > 1. Since the c-fields 

are nested, that is, F nrt C F nr - it+ i for all t < n, using Lemma A. 2, S nm — > 
N(0, (a, 6)S r (a, Thus, it also follows by a simple application of the 5- 

method that \/n('y nr — j r ) N(0, of) for r = 1, . . . ,p, where of is as defined 
in Theorem 1. □ 



Proof of Theorem 2. Using Lemma A. 4(a) and (b), it holds on event 



SUp|7nj -7l =Op( 1 ) 1 (p+l)xl, 

j 



A n that 






(21) 






where 7 = 


(70, 7i>--- 






PnOj 


(22) 


sup 


Pnlj 




3 








Pnpj 



{W%)/</>i(u%)h: 



nj , 



°p( 1 ) 1 (p+l)xl- 



- {W%)/Mu%)hp 

By (22), properties (b), (c), (d), boundedness considerations and the law of 
large numbers, 



1 n 



2 

nOj 



m j 

7o 2 E— W 2 TO + ^(1)> 



To 

72, 



2 n 



J? 



X)^ 2 (C7 ni ) + op(l) =7o^{V' 2 (^)} + Op(l), 



i=l 



H j=lk=l 



$npjX np jf.) 



^(U'njWnjk + SnOjkYnjk ~ 7l 



n 



j =1/5=1 



Onljk^nlik ~ ' ' 



7 P 



n \ ^npjfc -^npjk + °p ( 1 ) 
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1 n 

- E W»)e 2 m + op(l) = <7 2 £{^ 2 (t/)} + o p (l) 



n . 



~ X X/ X nrjk = ~ X ^(^nj ) X ^nrjfc + °p(l) 

j=l fc=l n j=l fe=l 



and 



2 n 

n f— f 



- X A«3 X = - X m^MKj) X + Op(l) 

U j=l fc=l n j=l fc=l 

n 



n * , 



= 7r £{^(£/)</> r (£/)}£(X 2 ) + 0p (l), 

where <5 n ojfc an d <5nrjfc are as denned in Appendix A. 3. Using Lemma A. 3, 
Lemma A. 4(a) and (31), 

m L ( 1 ~ ~ \ ~ l 



y ry / 11 



E — — -^nrj ( 7 X nj X nj ) — *■ {- E (^r)} 2 (^f~ 1 )rr- 

j = l ^ \-L J nj J rr 

Since 7„o 70, 7nr 7n s 2 ^ — > var(A > r ) and X nr E(X r ), the result fol- 
lows. □ 

APPENDIX: AUXILIARY RESULTS AND PROOFS 

A.l. Additional technical conditions. We introduce some further tech- 
nical conditions: 

(C6) The functions hi(u) = J xg\{x,u)dx and h2(u) = J xg2(x,u) dx are 
uniformly Lipschitz, where gi(-,-) and g2(-, •) are the joint density func- 
tions of (X,U) and (Xe,U), respectively. 

(C7) The error term satisfies E\e | < 00 for A > 4. 

Conditions (CI), (C6) and (C7) are needed for the proof of Lemma A. 4 
given in the next section. 
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A.2. Auxiliary results on martingale differences. 

Lemma A.l. Under the technical conditions (C1)-(C6), on event A Ti 
(11) the martingale differences Z n Q t satisfy the conditions 



(a) J2 E i Z noJ(\Z n ot\>e)}^0 for all e>0, 
t=i 

(b) A^f^^ 2 fora 2 >0. 

t=i 

Proof. Let Z nQt = w n0t v n0 t, where w n0t = l/\/n, and 

VnOt = 10^{U nt ) + V'(f/nt)eni{(i„ i (t)^5 (t) A'V (t) ) _1 X;J {t) } lfe(t) - 7o 
= Ct\nt + Oi2nt^nt, 

where a lnt = lo^{U nt ) - 7o , a 2n t = HUnt){(L-^ t) X^ it) X' nj{t) )- l X^ {t) } lk{t) 
and E(v n Qt) = 0. Using (CI), (C3) and (C4), it holds on event A n that 
sup 1<t<n \ot\nt\ < c\ and sup 1<t<n \a.mt\ < C2 for some c\,ci > 0. Thus, it 
holds for e > that 

n n „ 

E E{Zl ot I(\Z n0t \ > e)} = E / z 2 /(|x| > e) dF w 

nOtVnOt \ X ) 

t=l J 

n . 

E J x 2 I{\x\ > e/\w n0t \)wl ot dF Vn0t (x) 



t=i t=i 

n 



t=l ' 



= n X E [x 2 I(\x\ > Vn~e)dF Vn0t (x) 
t=i J 

n 

<n~ 1 Y,{E(vtot)} 1/2 {P(v 2 n0t >ne 2 )}^. 
t=i 

Now, E(VnQt) is bounded uniformly in n and t, since e n t has finite fourth 
moment by (C7), and P("u 2 0t > ne 2 ) = P{{ai n t + a 2 nte n t) 2 > ne 2 ) < P{a 2 lnt + 
a2 nt e 2 lt + 2\a ln ta 2 ntent\ > ne 2 ) < P(c 2 + c 2 .e 2 lt + 2ciC2\e nt \ > ne 2 ). Lemma A.l(a) 
follows, since P(c 2 + c\e 2 ni + 2c\C2\e n t \ > ne 2 ) — ► uniformly in n and t, 
and |e„f I being i.i.d. with finite fourth moments. 
The term A 2 given in Lemma A. 1(b) is equal to 

A 2 = 7o 2 |n- 1 E^(^)| +7o -2 7o 2 |n- 1 EV'(^)| 
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- 2 7o n- 1 ^^(^r ii )e„ i {(L- 1 (i) Xj (i) X; j(t) )- 1 x5 (i) } lfcW 



+ n- 1 E^(^)4 t {(^- {t) ^5( t )^( t ))~ 1 ^(t)}i 



J nj (t) ^ nj (t) ^ nj (t) 7 ^ v ry (t) I lfc(t) 
t 



= Ti + --. + T 6 . 
It follows from the law of large numbers that 

T X + T 2 + T 3 ^ jiEi^iU)} + 7o 2 - 2j 2 E{i;(U)} = 7o 2 var{^(£/)}. 
On event A n , E(T 4 \U, X, L nj ) = and 

A 2 2 

var(T 4 |£/,X,L n ,) = ^E^ 4 (^){(^ t) ^)< j( *)) -lx ^)>^(t) 

f 

= 0(n~ 1 ). 

Thus, E(T^) = and var(T4) = 0(n~ l ), implying that T4 = O p (n~ 1 / 2 ) on 
^4 n . Similarly, it can be shown that T5 = O p (n -1 / 2 ) on ^4 n . 
Next consider the last term Tq, which can also be written as 

m L n j 

Te=n ^Yl{( L nj X nj X nj) ^nj}^ (Unjk) e njk- 
j=l k=l 

Expanding {{L~j X% ' X'^' 1 X^} 2 k i> 2 '(U' njk )e' 2 jk for each k, we get 
j=lk=l 

+ (L n j X^jX' n j) 12 1 e' n j k if)(U' n j k )X' nl j k H 

+ ("^nj X nj X nj)l,p+l e njk' l P(^njk)- X npjk} i 

which by Lemma A. 4(a) and the law of large numbers is equal to 
a 2 E{^(U)}[(X^) 2 n + {X~ l ) 2 2 E{X 2 ) + ■■■ 

+ ( x ~ )i,p+iE(X p ) 
+ {2{X~ 1 ) ll {X^ l ) 12 E{X l ) + • • • 
+ 2(A'- 1 ) 11 (^- 1 ) liP+1 S(X p )} 
+ {2(^- 1 ) 12 (^ 1 ) 13 ^(X 1 X 2 ) + • • • 
+ 2(X- 1 ) 12 (X- 1 ) hp+1 E(X 1 X p )} + ■■■ 
+ {2(A'- 1 ) lp (A'- 1 ) liP+1J E(X p _ 1 X p )}] + op(l) 

= <7 2 £7{^ 2 (17)}(^- 1 ^ T ^- 1T ) 11 + op(l) 
= cr 2 J B{^ 2 (t/)}(^- 1 )ii + o P (l), 
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where X is as defined in (C5) and given explicitly in (8). Thus 
A 2 4 7o 2 varMC/)} + a 2 {X^E^ 2 {U)} = a 2 , 
and Lemma A. 1(b) follows. □ 

Lemma A. 2. Under the technical conditions (C1)-(C6), on event A n 
(11) the martingale differences Z nr t satisfy the conditions 



(a) J2 E i z2 nrtH\Znrt\>e)}^0 for all e > 0, 
t=i 

n 

(b) A 2 nr = J2Z 2 nrt ^(a,b)^ r (a,b) T for (a,b)Z r (a,b) T > 0. 



t=i 



Proof. Let Z nrt = w nrt v nr t, where w nrt = 1/ y/n, a 3nt = a<y r ip(U nt )X nrt - 
a lr E{X r ) + b<p r {U nt )X nrt - bE(X r ), a 4nt = <&^ {t) 4(Ur*){(L^ {t) X% (t) x 

X 'nj(t))~ lx 'nj{t)}rk(t), v nrt = a 3nt + a Ant e nt and E(v nrt ) = 0. On event A n , 
sup 1<t<n \ot3nt\ < C3 and sup 1<t<n |a4 n t| < C4 for some 03,04 > 0, and thus 
Lemma A. 2(a) follows in a fashion similar to Lemma A. 1(a). 
The term A 2 r in Lemma A. 2(b) is equal to 



nrt 



+ b 2 {E(X r )} 2 -2a 2 1 2 .E{X r )\n~ l ^(U nt )X nr ^ +2ab lr {E{X r )} 2 
+ 2ab lr ^n~ 1 J2HUnt)MUnt)X 2 nrt ^ - 2b 2 E(X r )^n~ 1 ^ <t>r{U nt )X n J> 
-2ab lr E{X r )^n- l ^{U nt )X nr ^ 
- 2ab lr E(X r ) J n" 1 ]T MUnt)X nrt 1 



+ 2a 2 7 r n 1 ^^ 2 (C/ n4 )e nt X^ (t) X nrt {(L nj 1 ( . t) X^- ) X^ (t) ) X^} rjfe ^ 
t 

t 

+ 2abn- 1 J2mnt)MUnt)e n tX' n ^ t) X nH {(L-^ 
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- 2abE(X r )n~ 1 ^il)(U n t)e nt X' nr ^ t) {(L-^ t) X^ 1 X nj(t)} rk{t) 

t 

+ a 2 n~ l ^ 2 ( u nt)e 2 nt X ,2 rj{t) {(L~^X'^^X' nj ^) l X'^ t) } 2 k{ ^ 
t 

= Ti + ..- + T 15 , 
and by the law of large numbers 

T X + ■ ■ ■ + Tio ^ a 2 ^[{E(X r )} 2 var{^([/)} + var(X r )£# 2 (U)}] 

+ 2abj r [E{MU)4>(U)}E(X 2 ) - {E(X r )} 2 ] + b 2 var(X r ). 
On event A n , E(Tn\U, X, L n j) = and 
vax(Tu\U,X,L n j) 

= „ 2 ^ ^2 ^ ( Unt ) nrt i( L nj(t) X nj{t) X 'nj (t) ) ^^(t) }r*(t) ' 
" t 

which is 0(n _1 ). Thus, E(Tn) = and var(Tn) = 0(n _1 ), implying that 
Tn = (^(n" 1 / 2 ) on A n . Similarly, it can be shown that T\ 2 = T13 = T14 = 
O p {n~ 1 / 2 ) on A n . 

Next consider the last term T15, which can also be expressed as 

Ti 5 = a 2 n 1 ^2^2{(L n j X n jX' n j) 1 X' nj } 2 k 4) 2 {U' n j k )e' 2 j k X ,2 r j . 
j=x k=X 

Again expanding {(L^/X^X^)" 1 ^}^ ^(^.Je'^ X^- for each k, we 
get 

m L n j 

Ti 5 = a 2 ?!" 1 ^ Y{{L~jX^X' n ^~ x X' nrj e' njk ^{U'^ k ) 
j=lfc=l 

C^ry X nj X nj)r2 X nrj ( ^njk^l , i^njk) X nXjk ' ' ' 

which by Lemma A. 4(a) and the law of large numbers is equal to 
a 2 a 2 {E(X r )} 2 E{i, 2 (U)} 

x K*- 1 ) 2 ! + {X~ l f r2 E{X 2 ) + ■■■ + {X~ l )l p+l E(X 2 p ) 

+ {2(X- 1 ) rl (X- 1 ) r2 E(X l ) + ■■■ + 2(X~ 1 ) rl (X-% p+1 E(X p )} 
+ {2(X~ 1 ) r2 (X- 1 ) r3 E(X 1 X 2 ) + ■■■ 

+ 2(X- 1 ) r2 (X- 1 ) r!P+1 E(X 1 X p )} + • • • 
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+ {2(X l ) rp (X 1 ) r)P+ iE(Xp-iX p )}] 



+ o P (l) 



a 2 a 2 {E(X r )} 2 E{^ 2 {U)}(X- l X T X- lT ) rr + o p (l) 
a 2 a 2 {E(X r )} 2 E{^ 2 (U)}(X^) rr + o p (l). 



Thus 



A 



(a, fe)S r (a, b) T = (a, b) 



S r i2 £ r 22 



(a, 6) 5 



where Y> rll = ^ 2 r [{E{X r )} 2 vai{^(U)} + v&T{X r )E{4> 2 {U)}} + a 2 {E{X r )} 2 x 
^JCOX^-V, S rl2 = lr [E{MUMU)}E(X?)-{E(X r )} 2 } and E r22 = 
var(A r ). Hence Lemma A. 2(b) follows. □ 

A. 3. Auxiliary results on approximations of inverses. Defining 5 n Qjk = 
m'njk)-^'*) and Snrjk = MK jk ) ~ MKj) 1 < k < Lj and 1 < r < p, 

where U'*j = L~j~ Y^k=i ^'njk * s * ne avera g e of the U's i n Bnj, we obtain 
the following results, by Taylor expansions and boundedness considerations: 
for 1 < t,s <p, < r,r' < p and 1 < i < 2, (a) sup fc>j \U' njk - U' r * \ < (b - 
a)/m; (b) sup fc j \S nrjk \ = 0(m _1 ); (c) sup.,- \L~j J2k 5 nrjkX% tjk \ = C^m" 1 ); 
(d) snpj \L^J2k S lrjk X ntjk\ =0(m- 2 ); (e) sup^ \L~j J2k °~nrjkX' ntjk X' nsjk \ = 
Oim- 1 ); (f) su Pj |L-/Efe W x ^fcl^I^I = 0(m~ 2 ). 

Lemma A. 3. Under the technical conditions (C1)-(C6) ; it holds on 
event E n (12) that 

j 



where 



(23) $ 



1 



i/MK 



ji* > 



E n j = (L nj X'^X' nj ) 1 and l(p+i) x (p+i) denotes the (p+ 1) x (p+ 1) matrix 
of Vs. 



Proof. The proof is by induction on p. Define 



2 nj 



(od) x'^ = 1 x' 1 (x 1 x' ~)W — x Wy 7 x' y 

nrj j / j nrik ) x^- nrj nsj ) / A nrik -^-nsik) 



L 



n l k=l 



n 3 k=l 



22 
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and analogously for X'^ r - and (X' nr jX' ns j)^ where 1 < r, s < p. First consider 
the claim for p = 1 on E n , 



( B nj X 'nj X [ 



1 



nj ^nj"nj, 



P-'(2) 



X 



nlj 



\ A nlj) 



X 



/(2) 
nlj 



-z 



'(1) 

nlj 



1 



By boundedness considerations and properties (c) and (d), it holds that 



U{2) 

and therefore 



j j 

= 0(m~ 1 ), 

where d nj = det(L-*X!?X' nj ) and d nj = &et{L-j X'? X' nj ) . Thus, 
supKL^X^.)- 1 " Q3^)| = 0(m- 1 )l 2 x2, 



where ($ r y) 2X 2 is as given in (23) and (H ni ) 2X 2 = (L n jX^X nj ) 2 ^ 2 . 

Next, we show that Lemma A. 3 holds for p + 1, assuming it holds for p. 
Let 



( L nj X ^ X nj)(p+2)x(p+2) ~B 



Bm 11 



B 



nj 12 



B njl2 B nj 22 



(t-1 y'Tyf \-l 

K^nj ^nj^nj J (p+2) X (p+2) 



B 



B n 

12 T 



B 



»j 



and similarly let 

(L n j X nj X nj)(p+2)x(p+2) = D 



(T-lv'Tvi \-l 

K^nj ^ nj ^ nj J (p+2) x (p+2) 



D n jl 

D 



njl2 
nj 



B 12 

nj 
R22 
"j J 



Dnj 12 
D n j22 

D 12 

nj 

D 22 

nj 



where B njU - (L nj ^^XOcp+^xCp+i) and Ayil - C^ry Xtf Xj)(p+i)x(p+i)- 
By the assumption, 

(25) SUplB"; 1 !! - ($ nj H3^)(,H-l)x(p+l)l = O^^Ji&H-lJx&H-l)- 



By properties (c), (d), (e), (f ) and boundedness considerations, it holds that 
(26) sup|S n ji2 - (y n j BD nj i 2 )\ = 0(m _1 )l(p + i) x i, 



(27) 



SUp \B nj22 ~ ^ (p+ i)(^)£ ) r t i22| = 0(m X ), 
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where B T n]12 = (X'^ +1)j , (X' n{p+1)j X' nlj )W , . . . , (X' n(p+1)j X' npj )(% D T njl2 = 

( X n(p+l)j' ( X 'n{j>+l)j X 'nlj) {1 \ ■ ■ ■ ) ( X 'n(p+l)j X npj) {l) ) > B nj22 = X'^ +1)j , D nj2 2 = 
X 'nl+l)j and V ni = (^lTO,^l(^)0lTO,---.^l(^O^TO)- 

Since = (B nj22 - B^B'^B^)- 1 , using (25), (26), (27) and the uni- 
form boundedness of D n ji2, D^Ki-* D n j22 on yl n , 

sup \Bl] - {^{U'^D™}- 1 ] = Oim" 1 ), 
j 

where inf, \<f>l +1 (U>*)D%\ = inf, |^ (*/£•)( Atf22 _ D^D^D^)] > 0, 
since is assumed to be strictly positive, and since supj \D n j22 — 

Dnj\2 x D~jnD n ji2\ > 0. The latter holds on A n , since then supj|(i,-| = 

SUPj |det(Ayn) X (D nj2 2 ~ Dnjl2 D njll D njl2)\ > 0. 

Now Bl) = B-^ + B-^B^B^Bl^B-^. Since D njl2 , D~j n are uni- 
formly bounded on A n , 

(28) sup|Bjf - ($ nj - BT nj )\ =0(m- 1 )l (p+1)x{p+1) , 

i 

where $ n j is as defined in (23), and T nj = D^+D^D^D^D^^D^ = 
D n . 

Since B^ = — B^B n ji2 B nj22i using (26), (27), (28) and boundedness con- 
siderations, 

supl-Sii - (Pry HA nj -)| =0(m~ 1 )l (p+1)xl , 

where fi£ = (£/£), l/{^+i(C^.)0i(^)h ■ • • , V{^+i(^)^(^)}) 

and A n j = —D^D n ji2D~j22 = Thus, reassembling the partitioned ma- 
trix B~- , Lemma A. 3 follows. □ 

Lemma A. 4. Under the technical conditions (C1)-(C7) ; for a sequence 
r n such that r n = O p {y/ (m log n)/n}, on event A n (11) 

(a) svvp\{L-jX'Tx' nj )- 1 - X~ x \ = O p (r n )l (p+1)x(p+1) , 

j 

( b ) sup \L~jX%e' nj \ = O p (r n )l (p+1)xl , 

j 

where X as defined in (8) is assumed to be nonsingular by (C5), and e' n j = 
(p' p' \ T 

V c ryl> ■ • ■ ' c ryX J ■ 
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Proof. Using the sample moment notation in (24), 



1 y'T X' 

j /^nj^nj 



X 



1 

'(1) 

nlj 
'(1) 



X 



'(i) 

nlj 
1(2) 
nlj 



"npj 
( X 'nlj X 'r, 



X • ' ( X' X' 

npj V npj nlj 



t(l) 



lj X npj, 
/(2) 



(1) 



(p+l)x(p+l) 



leads to 
d 



E(- 1 ) sign(r) ( L ^ 1 <K J 



ry "»y ">y'/lr(l) 

where the sum is taken over all permutations r of (1, . . . ,p + 1), and sign(r) 
equals +1 or —1, depending on whether r can be written as the product of 
an even or odd number of transpositions. The terms in the above sum have 
the general form 



(29) 



y'{i) iyi yl \{i) ...(yi y> 
nr\j\ nlj-^-nr^j) K^npj^r, 



(1) 



where X'q = 1 and (ri, . . . ,r p+ \) is a permutation of (0, . . . ,p). Considering 
the definition of the Nadaraya- Watson kernel estimator [10, 12], we note 

^ a + (2j - 

1){(6 — a)/(2m)} are the midpoints of the bins B n j. Uniform consistency 
of Nadaraya- Watson estimators with kernels of compact support has been 
shown in [4], where 



that an arbitrary term in (29) has the form (X'^X'^ j)^- 1 
for < s < p + 1, K(-) = (1/2)1 [ _ 1)1] , h = (b - a)/m, and U% 



(30) 



sup \m nsra+1 (u) -m srs+1 (u) \ =O p (r n ), 

a<u<b 



m srna+1 (u) = E(X s X Ta+l \U = u) = E(X s X ra+1 ), and r n is as defined in Lem- 
ma A. 4. Then (30) implies 



(31) 



sup\m nsrs+1 (U%j) — m STa+1 (Unj) \ = O p {r n ), 



sup\(X' nsj X' nrs+lj )M - E{X s X ra+1 )\ = O p {r n ) 
j 



Hence the uniform consistency of (29) follows, where the limit of (29) is 
E(X ri )E(X 1 X r2 )---E(X p X rp+1 ), and 



(32) 



sup|d n j -det(X)\ = O p (r n ) 



follows. 

The cofactor of (L~^X'^X n j) r £ is defined by (—l) r+e times the minor of 
(L~- X'?X n j) r e, where the minor is the determinant after deleting the rth 
row and the ^th column of X!?X n j). With a similar argument as in 
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the case of d n j, it can be shown that the minor of (L~^X'^X n j) r g converges 
uniformly over j to the minor of {X) r i with rate r n . Thus part (a) of the 
lemma follows. For part (b) of the lemma, consider 

(L n j L n j L n j \ T 

r-lV 1 -/ 7" _1 y^ X 1 p' r-1 v' J \ 

n i / j nik ) ni / j nlik niki • • • i ni / j nvik c nik I 
k=l k=l k=l ) 

Each term in the above sum is equal to m(U^j), where m(U^j ) = E(e\U) = 
or m(U^j ) = E(X r e\U) = 0, for r = 1, . . . ,p. Thus by the uniform consistency 
of m(U^j) for m(U^j), part (b) of the lemma follows. 

On event A n , (32) implies that P(inijd n j > £) — > 1 as n — > oo, where 
C = min{p/2jinf i (0f(^),...,^(^*.))]P p /2} and p is as defined in (C5). 
We also need to show P(minj L n j < p) — > as n — > oo in order to show 
that P(A) — > 1 as n —> oo. Since P(minj L n j > p) = 1 — P(0 < L n j < p for 
all j = 1, . . . , m) > 1 - P(0 < < p) > 1 - m su Pj P(0 < L ni < p), it 
is enough to show P(0 < L n j < p) = o(m _1 ) uniformly in j. Now, L„j ~ 
B'm(n,p n j), where c\(b — l)/m < p n j < 02(6 — a)/m uniformly in j, and 
ci,C2 are as given in (CI). Therefore, mP(0 < L n j < p) = mJ2x=oPnj(^ ~ 
Pnj) n ~ x n\/{x\{n-x)\) <mJX =0 n x {c2(b-a)/m} x {l-( Cl (b-a)/m)} n - x ^ 
Y^ = o'm(n/m) x {e~ Cl ^ b ~ a ^} n ^ m , where is used to denote asymptotic equiv- 
alence. The previously made assumption of mlogra/n -> as n -> implies 
logm/ (n/m) — > as n — > 0. Thus, logm + 2;log(n/m) — nc\{b — a)/m — > —00, 
m(n/?n) :r {e- Cl (''- a )} n / m -» for s = 0, . . . ,p and mP(0 < L nj <p)^0 uni- 
formly in j as n — > 00. It follows that P(^4) — > 1 as n — > 00. 

Furthermore, Lemma A. 3 implies 

sup \d nj - <j>\{U'* ■ ■ ■ <i> 2 p (U'*)d nj \ = O p {m- 1 ). 
j 

This shows that P(infj d n j > — > 1 as n — > 00, which implies P(A n ) — > 1 as 
n — > 00. Thus P(E n ) — > 1 as n — > 00. □ 

Acknowledgments. We wish to thank an Associate Editor and two ref- 
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